Search CORE

An Exploratory Study of Social Media Analysis for Rare Diseases using Machine Learning Algorithms: A case study of Trigeminal Neuralgia

Author: Korkin Dmitry
Li Ruojun
Mombini Haadi
Tulu Bengisu
Zhang Yixin
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2020
Field of study

Rare diseases, affecting approximately 30 million Americans, are often poorly understood by clinicians due to lack of familiarity with the disease and proper research. Patients with rare diseases are often unfavorably treated, especially those with extremely painful chronic orofacial rare disorders. In the absence of structured knowledge, such patients often choose social media to seek help from peers within patient-oriented social media communities thereby generating tremendous amounts of unstructured data daily. We investigate whether we can organize this unstructured data using machine learning to help members of rare communities find relevant information more efficiently in real-time. We chose Trigeminal Neuralgia (TN), an extremely painful rare disorder, as our case study and collected 20,000 social media TN posts. We categorized TN posts into Twitter (very short), and Facebook (short, medium, long) datasets based on message length and performed three clustering experiments. Results revealed GSDMM outperformed both K-means and Spherical K-means in clustering Facebook especially for short messages in terms of speed. For long messages, MDS reduction outperformed the PCA when both were used with K-means and Spherical K-means. Our study demonstrated the need for further topic modeling to utilize among high level clusters based on semantic analysis of posts within each cluster

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Accelerating large-scale protein structure alignments with graphics processing units

Author: Becchi Michela
Korkin Dmitry
Pang Bin
Shyu Chi-Ren
Zhao Nan
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present <it>ppsAlign</it>, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, <it>ppsAlign </it>could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated <it>ppsAlign </it>on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions <it>ppsAlign </it>is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.</p

Springer - Publisher Connector

arXiv.org e-Print Archive

Finding shortest and nearly shortest path nodes in large substantially incomplete networks

Author: Alderson David L.
Cui Hongzhu
Eisenberg Daniel A.
Elmokashfi Ahmed
Ganin Alexander
Kitsak Maksim
Korkin Dmitry
Linkov Igor
Publication venue
Publication date: 08/04/2022
Field of study

Dynamic processes on networks, be it information transfer in the Internet, contagious spreading in a social network, or neural signaling, take place along shortest or nearly shortest paths. Unfortunately, our maps of most large networks are substantially incomplete due to either the highly dynamic nature of networks, or high cost of network measurements, or both, rendering traditional path finding methods inefficient. We find that shortest paths in large real networks, such as the network of protein-protein interactions (PPI) and the Internet at the autonomous system (AS) level, are not random but are organized according to latent-geometric rules. If nodes of these networks are mapped to points in latent hyperbolic spaces, shortest paths in them align along geodesic curves connecting endpoint nodes. We find that this alignment is sufficiently strong to allow for the identification of shortest path nodes even in the case of substantially incomplete networks. We demonstrate the utility of latent-geometric path-finding in problems of cellular pathway reconstruction and communication security

TU Delft Repository

Digital Repository @ Iowa State University (ISU)

Phytonematode Peptide Effectors Exploit a Host Post‐Translational Trafficking Mechanism to the ER using a Novel Translocation Signal

Author: Baum Thomas J.
Davis Eric L.
Dhroso Andi
Hussey Richard S.
Korkin Dmitry
Liu Xunliang
Mitchum Melissa G.
Wang Jianying
Wang Xiaohong
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2020
Field of study

Summary Cyst nematodes induce a multicellular feeding site within roots called a syncytium. It remains unknown how root cells are primed for incorporation into the developing syncytium. Furthermore, it is an enigma how CLAVATA3/ESR (CLE) peptide effectors secreted into the cytoplasm of the initial feeding cell could have an effect on plant cells so distant from where the nematode is feeding as the syncytium expands. Here we describe a novel translocation signal within nematode CLE effectors that is recognized by plant cell secretory machinery to redirect these peptides from the cytoplasm to the apoplast of plant cells. We show that the translocation signal is functionally conserved across CLE effectors identified in nematode species spanning three genera and multiple plant species, operative across plant cell types, and can traffic other unrelated small peptides from the cytoplasm to the apoplast of host cells via a previously unknown post‐translational mechanism of ER translocation. Our results uncover an unprecedented mechanism of effector trafficking by any plant pathogen to date and illustrates how phytonematodes can deliver effector proteins into host cells and then hijack plant cellular processes for their export back out of the cell to function as external signaling molecules to distant cells

Diposit Digital de la Universitat de Barcelona

Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism

Author: Broly Martin
Calderwood Michael A.
Corominas Castiñeira Roser
Fan Changyu
Ghamsari Lila
Hao Tong
Hill David E.
Horvath Steve
Iakoucheva Lilia M.
Kang Shuli
Korkin Dmitry
Kuang Xingyan
Lemmens Irma
Lin Guan Ning
Malhotra Dheeraj
Michaelson Jacob J.
Rodriguez Maria
Roth Frederick P.
Salehi-Ashtiani Kourosh
Sebat Jonathan
Shen Yun
Tam Stanley
Tasan Murat
Tavernier Jan
Trigg Shelly A.
Vacic Vladimir
Vidal Marc
Yang Xinping
Yi Song
Zhao Nan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/04/2021
Field of study

Increased risk for autism spectrum disorders (ASD) is attributed to hundreds of genetic loci. The convergence of ASD variants have been investigated using various approaches, including protein interactions extracted from the published literature. However, these datasets are frequently incomplete, carry biases and are limited to interactions of a single splicing isoform, which may not be expressed in the disease-relevant tissue. Here we introduce a new interactome mapping approach by experimentally identifying interactions between brain-expressed alternatively spliced variants of ASD risk factors. The Autism Spliceform Interaction Network reveals that almost half of the detected interactions and about 30% of the newly identified interacting partners represent contribution from splicing variants, emphasizing the importance of isoform networks. Isoform interactions greatly contribute to establishing direct physical connections between proteins from the de novo autism CNVs. Our findings demonstrate the critical role of spliceform networks for translating genetic knowledge into a better understanding of human diseases

Structural Modeling of Protein Interactions by Analogy: Application to PSD-95

Author: Agrawal
Alberts
Aloy
Aloy
Anderson
Andrej Sali
Andreotti
Bairoch
Blaszczyk
Bonvin
Brenman
Burz
Cho
Clore
Cubelos
Davis
Dmitry Korkin
Dobrodumov
Dominguez
Doyle
Dueber
Duhovny
Dunker
Edwards
Frank Alber
Fred P. Davis
Fukunaga
Gonzalez-Mariscal
Grootjans
Halperin
Hata
Henrick
Holm
Holm
Hunt
Janin
Janin
Kennedy
Kennedy
Kim
Kim
Kistner
Koradi
Korkin
Kornau
Koulen
Kube
Letunic
Li
Long
Lu
Lu
Maier
Marcotte
Marti-Renom
Mary B. Kennedy
Mayer
McGee
Mendez
Min-Yi Shen
Mintseris
Morelli
Mulder
Murzin
Nakagawa
Niethammer
Nourry
Orengo
Park
Pawson
Pettersen
Pieper
Ren
Roche
Romorini
Rosenberg
Russell
Sali
Sali
Schneidman-Duhovny
Schultz
Schulz
Sekulic
Shen
Shevchenko
Smith
Song
Stroh
Tavares
Tinh Luong
Tochio
Uchino
van Dijk
van Zundert
Vladan Lucic
Vogel
Westbrook
Wodak
Wodak
Wu
Yaffe
Zarrinpar
Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

We describe comparative patch analysis for modeling the structures of multidomain proteins and protein complexes, and apply it to the PSD-95 protein. Comparative patch analysis is a hybrid of comparative modeling based on a template complex and protein docking, with a greater applicability than comparative modeling and a higher accuracy than docking. It relies on structurally defined interactions of each of the complex components, or their homologs, with any other protein, irrespective of its fold. For each component, its known binding modes with other proteins of any fold are collected and expanded by the known binding modes of its homologs. These modes are then used to restrain conventional molecular docking, resulting in a set of binary domain complexes that are subsequently ranked by geometric complementarity and a statistical potential. The method is evaluated by predicting 20 binary complexes of known structure. It is able to correctly identify the binding mode in 70% of the benchmark complexes compared with 30% for protein docking. We applied comparative patch analysis to model the complex of the third PSD-95, DLG, and ZO-1 (PDZ) domain and the SH3-GK domains in the PSD-95 protein, whose structure is unknown. In the first predicted configuration of the domains, PDZ interacts with SH3, leaving both the GMP-binding site of guanylate kinase (GK) and the C-terminus binding cleft of PDZ accessible, while in the second configuration PDZ interacts with GK, burying both binding sites. We suggest that the two alternate configurations correspond to the different functional forms of PSD-95 and provide a possible structural description for the experimentally observed cooperative folding transitions in PSD-95 and its homologs. More generally, we expect that comparative patch analysis will provide useful spatial restraints for the structural characterization of an increasing number of binary and higher-order protein complexes

Public Library of Science (PLOS)

Public Library of Science (PLOS)

Caltech Authors

Structural Similarity and Classification of Protein Interaction Interfaces

Author: A Andreeva
A Shulman-Peleg
AS Aytuna
B Alberts
BE Boser
Bin Pang
C Prieto
C Winter
CA Orengo
CD Livingstone
CE Stebbins
Chi-Ren Shyu
CJ Tsai
D Beckett
D Comaniciu
D Schneidman-Duhovny
Dmitry Korkin
ED Levy
FB Sheinerman
FP Davis
G Prehna
HM Berman
I Abbasi
I Guyon
J Fauchere
J Janin
J Teyra
JM Chandonia
M Guharoy
M Hall
M Shatsky
MF Lensink
MT Shamim
Nan Zhao
NC Elde
O Keskin
O Keskin
OV Belyaeva
P Aloy
P Aloy
P Ciaccia
P Rousseeuw
RA Laskowski
S Hubbard
S Hubbard
S Huo
S Jones
S Theodoridis
T Joachims
TS Furey
U Ogmen
Vladimir N. Uversky
ZA Hamburger
Publication venue: Public Library of Science
Publication date: 12/05/2011
Field of study

Interactions between proteins play a key role in many cellular processes. Studying protein-protein interactions that share similar interaction interfaces may shed light on their evolution and could be helpful in elucidating the mechanisms behind stability and dynamics of the protein complexes. When two complexes share structurally similar subunits, the similarity of the interaction interfaces can be found through a structural superposition of the subunits. However, an accurate detection of similarity between the protein complexes containing subunits of unrelated structure remains an open problem